Skip to content

feat(generators): add --deterministic flag for reproducible output (pyoxigraph RDFC-1.0)#1

Open
jdsika wants to merge 1 commit intomainfrom
feat/deterministic-output
Open

feat(generators): add --deterministic flag for reproducible output (pyoxigraph RDFC-1.0)#1
jdsika wants to merge 1 commit intomainfrom
feat/deterministic-output

Conversation

@jdsika
Copy link

@jdsika jdsika commented Mar 25, 2026

Review PR

This PR is for internal review of the --deterministic\ flag changes before submitting upstream to \linkml/linkml.

Key changes from the closed upstream PR linkml#3295:

  1. Replaced custom Weisfeiler-Lehman algorithm with pyoxigraph RDFC-1.0 (W3C standard, Rust implementation) — addresses the core concern raised by maintainers about rolling our own canonicalization.

  2. *Collection sorting gated behind --deterministic* — \owl:oneOf, \sh:in, \sh:ignoredProperties\ items are sorted only when the flag is set. This preserves existing behaviour by default.

  3. *\deterministic_json()* — recursive deep-sort for JSON output, gated behind --deterministic.

  4. Covering axiom fix — abstract classes with single child no longer get reversed hierarchy triples.

Test results

  • 146 passed, 3 skipped, 8 xfailed
  • 4 xfail tests document intentional non-isomorphism: Collection sorting changes RDF list structure (different
    df:first/
    df:rest\ triples) while preserving OWL/SHACL semantics.

New dependency

  • \pyoxigraph >= 0.4.0\

See .playground/pr-3295-description.md\ for the full upstream PR description draft.

@jdsika jdsika force-pushed the feat/deterministic-output branch from bdb0f7a to 6544b72 Compare March 25, 2026 16:03
@jdsika jdsika self-assigned this Mar 25, 2026
Add a --deterministic flag to OWL, SHACL, JSON-LD, and JSON-LD Context
generators that produces stable, reproducible output suitable for
version-controlled artifacts.

When enabled, the flag activates:

1. **RDFC-1.0 blank-node canonicalization** via pyoxigraph (W3C
   Recommendation) for Turtle serialisation of OWL and SHACL graphs.
2. **Deterministic Collection ordering** — RDF Collections (owl:oneOf,
   sh:in, sh:ignoredProperties) are sorted so that enum members and
   property lists appear in a stable order.  This intentionally changes
   the RDF graph (Collections encode order at the triple level) and is
   therefore opt-in.
3. **Deterministic JSON key ordering** for JSON-LD and JSON-LD Context
   output, with structure-aware sorting that preserves JSON-LD
   conventions (@context directives first, then prefixes, then terms).

The flag defaults to False to preserve backward compatibility.  Four
tests are marked xfail(strict=True) to document that deterministic
Collection sorting intentionally produces non-isomorphic output.

New dependency: pyoxigraph >= 0.4.0 (Rust-based, W3C RDFC-1.0).

Refs:
- W3C (2024) RDF Dataset Canonicalization (RDFC-1.0)
  https://www.w3.org/TR/rdf-canon/

Signed-off-by: jdsika <carlo.van-driesten@bmw.de>
@jdsika jdsika force-pushed the feat/deterministic-output branch from 6544b72 to 6ca468c Compare March 25, 2026 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant